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1.  Introduction 

If  one  observes  the  real  random  variables  Xi,  •  •  • ,  independently  normally  dis¬ 
tributed  with  unknown  means  •  *,  and  variance  1,  it  is  customary  to  estimate 
by  Xi.  If  the  loss  is  the  sum  of  squares  of  the  errors,  this  estimator  is  admissible  for 
n  ^  2,  but  inadmissible  for  «  ^  3.  Since  the  usual  estimator  is  best  among  those  which 
transform  correctly  under  translation,  any  admissible  estimator  for  «  ^  3  involves  an 
arbitrary  choice.  While  the  results  of  this  paper  are  not  in  a  form  suitable  for  immediate 
practical  application,  the  possible  improvement  over  the  usual  estimator  seems  to  be 
large  enough  to  be  of  practical  importance  if  m  is  large. 

Let  Z  be  a  random  w-vector  whose  expected  value  is  the  completely  unknown  vec¬ 
tor  ^  and  whose  components  are  independently  normally  distributed  with  variance  1. 
We  consider  the  problem  of  estimating  J  with  the  loss  function  L  given  by 

(1)  L(^,d)  =  = 

where  d  is  the  vector  of  estimates.  In  section  2  we  give  a  short  proof  of  the  inadmissi¬ 
bility  of  the  usual  estimator 

(2)  d=^AX)=X, 

for  »  ^  3.  For  n  —  2,  the  admissibility  of  |o  is  proved  in  section  4.  For  w  =  1  the  ad¬ 
missibility  of  lo  is  well  known  (see,  for  example,  [1],  [2],  [3])  and  also  follows  from  the 
result  for  «  =  2.  Of  course,  all  of  the  results  concerning  this  problem  apply  with  obvious 
modifications  if  the  assumption  that  the  components  of  X  are  independently  distributed 
with  variance  1  is  replaced  by  the  condition  that  the  covariance  matrix  S  of  Z  is  known 
and  nonsingular  and  the  loss  function  (1)  is  replaced  by 

(3) 

We  shall  give  immediately  below  a  heuristic  argument  indicating  that  the  usual  esti¬ 
mator  may  be  poor  if  » is  large.  With  some  additional  precision,  this  could  be  made  to 
yield  a  discussion  of  the  infinite  dimensional  case  or  a  proof  that  for  sufiSciently  large  n 
the  usual  estimator  is  inadmissible.  We  choose  an  arbitrary  point  in  the  sample  space 
independent  of  the  outcome  of  the  experiment  and  call  it  the  origin.  Of  course,  in  the 
way  we  have  expressed  the  problem  this  choice  has  already  been  made,  but  in  a  correct 
coordinate-free  presentation,  it  would  appear  as  an  arbitrary  choice  of  one  point  in  an 
affine  space.  Now 

(4)  Z2=  (Z- ^2-1-2  VPZ 
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where 

(5) 
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W 

has  a  univariate  normal  distribution  with  mean  0  and  variance  1,  and  for  large  »,  we  have 
(X  —{)*  =  »  +  Op{y/n),  so  that 

(6)  = 

Uniformly  in  (For  the  stochastic  order  notation  Op,  Op,  see  [4].)  Consequently- when 
we  observe  we  know  that  ^  is  nearly  —  n.  The  usual  estimator  would  have  us 
estimate  { to  lie  outside  of  the  convex  set  { ^  ^  ^  —  cnj  (with  c  slightly  less  than  1)  al¬ 
though  we  are  practically  sure  that  ^  lies  in  that  set.  It  certainly  seems  more  reasonable 
to  cut  X  down  at  least  by  a  factor  of  [(X®  —  «)/X®]^/^  to  bring  the  estimate  within  that 
sphere.  Actually,  because  of  the  curvature  of  the  sphere  combined  with  the  uncertainty 
of  our  knowledge  of  the  best  factor,  to  within  the  approximation  considered  here, 
turns  out  to  be  (X^  —  n)lX^.  For,  consider  the  class  of  estimators 

(7)  aX)=[l-i(f)jx 

where  A  is  a  continuous  real-valued  function  with  lim  \th{t)\  <  «».  We  have  (with 
p*  =  ?/») 

(8)  (|(X) =  = 

=  [1  -  i  (^)]“(X-  {).+  (^)-2  (|!)[l  -i  (^)]Z 

=  n  (1  —  2A  (1  H-  p^)  +  (1  +  p*)  A*  (1  +  p2)  ]  H-Op(  V n) . 

This  (without  the  remainder)  attains  its  minimum  of  np^l{l  +  p®)  for  A(1  +  p*)  = 
1/(1  +  p*).  In  these  calculations  we  have  not  used  the  normality. 

In  section  3  we  consider  some  of  the  properties  of  spherically  symmetric  estimators, 
that  is,  estimators  of  the  form  (7),  for  finite  n.  We  show  that  a  spherically  symmetric 
estimator  is  admissible  provided  it  is  admissible  as  compared  with  other  spherically 
S)munetric  estimators.  This  is  essentially  a  special  case  of  a  result  given  by  Karlin  [11] 
and  Kudo  [12]. 

In  section  4  we  use  the  information  inequality  in  the  manner  of  [1]  and  [2]  in  order  to 
obtain  lower  bounds  to  the  mean  squared  error  of  a  spherical  estimator  of  the  mean.  In 
particular,  for  »  =  2  this  proves  the  admissibility  of  the  usual  estimator.  For  «  ^  3  we 
obtain  the  bound  (n  —  2)^/^  for  the  as)anptotic  value  of  the  possible  improvement  as 
^  >  00 ,  which  is  proved  to  be  attainable  in  section  2. 

In  accordance  with  the  results  of  section  3,  a  good  spherically  symmetric  estimator  is 
admissible  for  any  n.  However,  roughly  speaking  as  »  — >  «»  it  becomes  less  and  less  ad¬ 
missible,  as  in  Robbins  [7].  A  simple  way  to  obtain  an  estimator  which  is  better  for  most 
practical  purposes  is  to  represent  the  parameter  space  (which  is  also  essentially  the  sample 
space)  as  an  orthogonal  direct  sum  of  two  or  more  subspaces,  also  of  large  dimension  and 
apply  spherically  s)mimetric  estimators  separately  in  each.  If  the  p^’s  (squared  length 
of  the  population  mean  divided  by  the  dimension)  are  appreciably  different  for  the  se¬ 
lected  subspaces,  this  estimator  will  be  better  than  the  spherically  symmetric  one.  It  is 
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unlikely  that  this  estimator  is  admissible  unless  Bayes  solutions  (in  the  strict  sense)  are 
used  in  the  component  subspaces,  but  it  is  also  unlikely  that  its  departure  from  admis¬ 
sibility  is  important  in  practice. 

In  section  5  we  consider  very  briefly  a  number  of  problems  for  which  answers  are 
needed  before  the  methods  of  this  paper  can  be  applied  with  confidence. 

2.  Inadmissibility  of  the  usual  estimator 

For  «  ^  3,  let  X  be  a  normally  distributed  random  «-vector  with  unknown  mean  { 
and  covariance  matrix  /,  the  identity  matrix.  In  addition  to  the  usual  estimator 
given  by 

(9)  io(X)=X, 

we  shall  consider  the  estimator  |i  given  by 


with  a,b  >  0.  We  shall  show  that  for  sufliciently  small  b  and  large  o,  |i  is  strictly  better 
than  ^o,  in  fact,  , 

(11)  £{£^(X-) -{)• 


for  all  To  prove  (11)  let  ^  =  F  -f  {  so  that  Y  is  normally  distributed  with  mean  0 
and  covariance  matrix  /,  the  identity.  Then 


-n-2hF  YiY+^)  (F+^)« 

^^^o+(F+{)*+^  ^  Ia+(F+{)21* 


<n-2bE 


F(F+$)  -6/2 

a+(F+$)» 


From  the  identity 
(13) 

we  find  that 


X  -  I 

-T-v— =  1  -  a;  +  T-i— 
1-f-  X  1  +  X 


a+(F+$)2 


2^F 

+  F2+{2 


4({F)2 


'  (o+F2+H[a+(F+{)*]  i' 

Since  the  conditional  mean  of  JF  given  P  is  0,  we  find  from  (14)  that 

/.cx  ^  F(F+?) -6/2_^  F*-6/2  ,^£[(^F)*i  F=*] 

^  ^  a-l-(F+{)*  a-l-F*+l*  (a-j-F^+H* 

.  ({F)MF(F+$)  -6/2] 

(a-l-F2+nMa+(F+€)^l  ' 


£[(€F)2l  F^l  =-^  F2<£±Z!±i!  F2 
n  n 
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SO  that 


F2-6/2  E{{^Yy\ F2]  ^  (l-2/«)  Y^-h/2 

a+F2+^2  (a+F2+^2)2  a+F2+^2 


n-2  F^  ^ 

n  a+$2V  a+^V  2(a+^2) 

_ n  —  2  —  6/2  (»  —  2)(w-i-2) 

(a+^2)2  • 


It  is  intuitively  clear  that  the  last  term  on  the  right-hand  side  of  (15)  is  o[l/(a  +  ^)] 
uniformly  in  ^  as  a  — >  <» .  To  give  a  detailed  proof  we  observe  that  for  ^  ^  a 


(18)  E 


(^F) 


(a+F^-f  {2)Ma+(F+^)2] 


^  -E 


?F 


(a+F2+HM«+(F+^)2] 
c 


For  ^  >  a, 
(19)  E 


(^F)^ 


(a+Y^+^H^  l<x+(F+n^] 


^  -E 


(a+{2)2a-  (o-f^2)a 

Ui"!' 


fe  -£ 


(a+  K2+{2)»(o+  (F+{)>) 

Ul'l’ 

(fl+F*+{*)Mo  +  i{»l 

JF" 


n 


^  - 


(a+|2)V2  (o+$2)o 


Combining  (18)  and  (19)  we  have 


(20) 


E 


(€F) 


{a-\-Y^+^^)Ha+{Y+^)^ 


uniformly  in  ^  as  a  — ♦  .  Also 

-|(£F)' 

^  (a+F=+r)M«+(F+£)=l  - 


1  \ 

2(o  +  {^)2a  \a  +  £V 


uniformly  in  f  as  a  — >  <» . 

Thus  from  (12),  (15),  (17),  (20),  and  (21)  we  find  that 


(22) 


n-2-  6/2 

a+S2 


uniformly  in  J  as  a  — >  » .  Consequently  if  we  take  0  <  6  <  2(n  —  2),  and  a  sufficiently 
large  this  will  be  less  than  n  for  all  If  we  take  6  =  n  —  2,  then  as  ^  ,  the  im¬ 

provement  over  the  risk  of  the  usual  estimator  is  asymptotic  to  (n  —  2)V In  section  4 
we  shall  see  that  this  is  asymptotically  the  best  possible  improvement  over  the  usual 
estimator  in  the  neighborhood  of  ^  =  <» . 
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3.  Spherically  symmetrical  estimators 

We  shall  say  that  an  estimator  ^  is  spherically  symmetrical  (about  the  origin)  if  it  is 
of  the  form 

(23)  n^c)  =  [1-A(»")] 

where  /f  is  a  real-valued  function.  This  is  equivalent  to  requiring  that  for  every  orthog¬ 
onal  transformation  g,  gp^g~^  =  that  is,  for  all  x 

(24)  g[Ug~^x)]  =  Ux) . 

First,  if  I  is  of  the  form  (23),  then 

(25)  g  [^Hg~'^x)]  =  g{  [1  -/f(3[:2)]  g-iac}  =  tix). 

Suppose  conversely  that  |  satisfies  (24)  for  all  orthogonal  g.  In  particular,  for  those  g 
which  are  reflections  in  a  subspace  containing  «,  g[^(a:)]  =  ^(x).  Consequently  ^(x)  lies 
along  X,  that  is, 

(26)  Hx)  =  [1  —  A'(a:)]  X 

for  some  real-valued  function  h'.  Since  a  vector  x  can  be  taken  into  any  other  vector 
having  the  same  squared  length  x^  by  an  orthogonal  transformation,  this  yields  (23). 

We  shall  show  that  if  a  spherically  symmetric  estimator  I2  is  admissible  as  compared 
with  all  other  spherically  symmetric  estimators,  then  it  is  admissible  (in  the  class  of  all 
estimators).  The  proof  is  based  on  the  compactness  of  the  orthogonal  group  Q,  and  the 
continuity  of  the  problem.  It  is  similar  to  a  proof  for  finite  groups  (see  p.  228,  [5],  and  p. 
198,  [6]).  We  shall  only  sketch  the  proof  since  a  general  result  for  compact  groups  will 
appear  elsewhere.  Because  of  the  convexity  of  the  loss  function  (1)  in  the  estimate  d  we 
can  confine  our  attention  to  nonrandomized  procedures  (see  p.  186,  [8]). 

Suppose  the  estimator  |  is  strictly  better  than  the  spherically  symmetric  estimator 
^2,  that  is, 

(27)  i?.(^)  =£^[|(X)  -^12^£j[|2(:^) 

for  all  ^  with  strict  inequality  for  some  Because  of  the  continuity  of  and  strict 
inequality  will  hold  for  ^  in  some  nonempty  open  set  S.  Since  I2  is  spherically  symmetric, 

(27)  will  remain  true  if  |  is  replaced  by  with  g  orthogonal;  in  fact, 

(28)  =£j{Ul(g-^X)] 

[Kg-^X)  -  g-^?]  2  =J?.(g-»^)  . 

Thus,  for  fixed  S,  the  set  of  g  for  which  i?go|og-i(^)  <  ■Rt,(^)  will  be  a  nonempty 
open  set.  Let  n  be  the  invariant  probability  measure  on  Q  which  assigns  strictly  posi¬ 
tive  measure  to  any  nonempty  open  set  (for  the  existence  of  such  a  measure  see  chap¬ 
ter  2,  [10]).  Then 

(29)  l'  =  /go?og-id;«(g) 

is  spherically  symmetric,  and  because  of  the  convexity  of  the  loss  function  (1)  in  d 

(30)  i?.,(^)  (?)^M(g) 

with  strict  inequality  for  S.  This  shows  that  I2  is  not  admissible  in  the  class  of  all 
spherically  symmetric  estimators  and  completes  the  proof. 
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4.  Application  of  the  information  inequality 

In  this  section  we  apply  the  information  inequality,  as  in  [1]  and  [2],  to  obtain  an 
upper  bound  for  the  possible  improvement  of  a  spherically  symmetric  estimator  over  the 
usual  one.  In  particular,  with  the  aid  of  the  result  of  section  3,  we  show  that  for  »  =»  2 
the  usual  estimator  is  admissible. 

Let  ^  be  any  estimator  of  {  with  ever3rwhere  finite  risk  R  and  let  b  be  the  bias  of 
that  is, 

(31)  b(^)  =Ea(X)-^. 

Then  by  the  information  inequality 

(32)  «({)  +  f 

for  any  17  with  ^  ri\j  =  1  for  all  i,  where  5,,-  =  1  if  t  =  j,  0  otherwise  and 

7 

(33)  ® 

OKi 


with  bi(^)  the  ith  coordinate  of  6(^),  Choosing 


(34) 


ViJ  = 


so  as  to  maxiinize  the  right-hand  side  of  (32),  we  find 

(35)  + 

».  7 

*  ».  » 

In  the  spherically  symmetrical  case  where  |  has  the  form  (23),  b  has  the  form 

(36)  = 

where  is  a  differentiable  real-valued  function.  In  this  case,  dropping  the  last  term, 
(35)  becomes 

(37)  i?(?)  ^»+JVH«2)  -2«^(n  -4tV'(n. 

We  first  use  (37)  to  prove  that  forn  =  2  the  usual  estimator  given  by  (2)  is  admissible. 
By  the  results  of  section  3,  if  |<,  is  not  admissible  there  exists  a  spherically  S)Tnmetric 
estimator  |  which  is  strictly  better,  and  therefore  there  exists  a  function  tp  not  vanishing 
identically  such  that 

(38)  ^2+€V*(e*)  -4^(«*)  -4{V(^*) 
for  all  ^  >  0.  Letting  <  =  ^  and  4/{t)  =  tip{t)  we  find 

(39)  0^i^2(0  -4^'(0 
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for  t  >  0.  This  shows  that  is  a  nondecreasing  function.  We  shall  show  that  (39)  im¬ 
plies  that  ^  is  identically  0.  Suppose  first  that  ^^(4)  <  0  for  some  to  >  0.  Then  integrat¬ 
ing  the  inequality 


(40) 


1 

^2(0  -  At 


from  /  <  4  to  4  we  obtain 
(41) 


1 


— >  1 
^(0 


The  left-hand  side  is  bounded  as  /  0  whereas  the  right-hand  side  approaches  +  00  so 

that  this  is  a  contradiction.  If  on  the  other  hand  >  0  for  some  to  >  0,  then 


(42) 


iKO 


for  all  /  >  4.  As  /  — »  00 ,  the  left-hand  side  is  bounded  and  the  right-hand  side  approaches 
+  CO  so  that  we  again  have  a  contradiction. 

Next  we  shall  apply  (35)  to  show  that  for  «  ^  3  there  cannot  exist  c>  {n  —  2Y  and 
such  that  for  all  ^ 

(43)  R(.()in—^. 

[We  have  seen  in  section  2  that  there  is  an  estimator  which  yields  an  improvement  over 
the  usual  estimator  asymptotic  to  (n  —  2)V^  as  ^  >  00 .]  It  will  sufl&ce  to  show  that 
the  differential  inequality 

(44)  «--paM+{V(P)  -2»*.(£‘)  -4£V'(£"). 

obtained  by  combining  (37)  and  (43)  has  no  solution  valid  for  all  ^  To  see  that 

(44)  has  no  solution,  let 

(45)  ^({2)  =2^  +  /(J!). 

Then  (44)  becomes 

(46)  -‘-("-^)'&£^y^(£‘)  -4/(n  -4{V'(£'). 

Let  t  =  4f{i)  =  //(/).  Then 

(47)  -  ‘  ~  a  i  »2  (()  -  it) , 


(48) 


^'(<) 


>1 


a’  +  ^MO  “  4(’ 


where  a*  =  c  —  (»  —  2)*.  From  the  inequality  (39)  (for  all  t  ^  4)  which  is  weaker  than 

(47)  we  concluded  that  rff(t)  ^  0.  Consequently  for  4  <  t 


(49) 


tan* 


4a 


The  left-hand  side  is  bounded  (since  ^  does  not  change  sign)  and  the  right-hand  side  ap¬ 
proaches  +  00  as  /  CO ,  which  is  a  contradiction. 
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5.  Miscellaneous  remarks 


In  this  section  I  shall  indicate  a  few  of  the  many  problems  which  must  be  solved  be¬ 
fore  the  methods  suggested  in  this  paper  can  be  applied  with  confidence  in  all  situations 
where  they  seem  appropriate. 

(i)  It  seems  that  similar  improvements  must  be  possible  if  the  variance  is  unknown, 
but  there  is  available  a  reasonable  number  of  degrees  of  freedom  for  estimating  the  vari¬ 
ance.  Presumably  the  correction  to  the  sample  mean  will  be  smaller  with  a  given  esti¬ 
mated  variance  than  if  that  value  were  known  to  be  the  variance.  If  there  are  no  addi¬ 
tional  degrees  of  freedom  for  estimating  the  variance  it  is  clear  that  the  usual  estimator  is 
admissible.  For,  if  there  is  a  better  estimator  we  can  (because  of  the  convexity  of  the 
loss  function)  construct  a  continuous  estimator  which  is  also  better  than  fo  by  taking, 
for  example, 

(50)  e-yy^dy  . 

Then  there  is  an  €  >  0  and  a  disc  S  of  radius  at  least  e  such  that 

(51)  [l'(x)  - 

for  all  re  6  •S'.  If  the  variance  of  each  component  is  much  less  than  e^/n  then  the  mean 
squared  error  of  will  be  small  compared  with  e*  whereas  that  of  will  not. 

(ii)  The  (positive  definite)  matrix  of  the  quadratic  loss  function  may  be  different  from 
the  inverse  of  the  covariance  matrix.  It  is  intuitively  clear  that  the  usual  estimator  must 
be  inadmissible  provided  there  are  at  least  three  characteristic  roots  which  do  not  differ 
excessively.  However,  because  of  the  lack  of  spherical  symmetry  it  seems  dfficult  to 
select  a  good  estimator. 

(iii)  The  covariance  matrix  may  be  wholly  or  partially  unknown.  Suppose  for  example 
that  the  covariance  matrix  is  completely  unknown  but  there  are  enough  degrees  of  free¬ 
dom  to  estimate  it.  For  simplicity  suppose  the  matrix  of  the  quadratic  loss  function  is  the 
inverse  of  the  covariance  matrix.  Again  it  seems  likely  that  the  usual  estimator  is  in¬ 
admissible.  The  problem  of  finding  an  admissible  estimator  better  than  the  usual  one  is 
complicated  by  the  fact  that  there  is  at  present  no  reason  to  believe  that  the  usual  esti¬ 
mator  of  the  covariance  matrix  is  a  good  one. 

(iv)  At  least  two  essentially  different  sequential  problems  suggest  themselves.  First 

we  may  consider  observing,  one  at  a  time,  random  ^-vectors  •  •  •  independently 

normally  distributed  with  common  unknown  mean  vector  f  and  the  identity  matrix  as 
covariance.  If  we  want  to  attain  a  certain  upper  bound  to  the  mean  squared  error  with 
as  small  an  expected  number  of  observations  as  possible,  it  seems  likely  that  we  must 
resort  to  a  sequential  scheme. 

Also,  consider  the  situation  wher^  we  observe,  one  at  a  time,  real  random  variables 
Xi,  -  •  -  ,  Xn  (with  n  fixed  but  large)  independently  normally  distributed  with  unknown 
means  ^i,*  •  •,  and  variance  1.  Suppose  we  want  to  estimate  the  with  the  sum  of 
squared  errors  as  loss,  but  we  are  forced  to  estimate  immediately  after  observing  Xi.  It 
is  not  clear  whether  it  is  admissible  to  estimate  ^i  to  be  Xi.  However,  if  it  is  admissible, 
it  can  only  be  because  of  the  usual  unrealistic  assumption  of  a  malevolent  nature.  In  any 
case  it  should  be  possible  to  devise  a  scheme  which  will  improve  the  estimate  consider¬ 
ably  if  the  means  are  small  without  doing  much  harm  if  they  are  large. 

(v)  It  is  not  clear  whether,  in  testing  problems,  the  usual  test  may  be  inadmissible  for. 
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the  reasons  given  in  this  paper.  This  uncertainty  cannot  arise  in  the  problem  of  testing 
for  the  variance  of  a  normal  distribution  with  unknown  means  as  nuisance  parameters 
(see  p.  503,  [9]).  However,  in  the  case  of  distinguishing  between  two  possible  values  of 
the  ratio  of  a  mean  to  the  standard  deviation,  with  unknown  means  as  nuisance  parame¬ 
ters,  the  situation  is  unclear.  Also  the  inadvisability  (but  not  inadmissibility)  of  using 
spherical  s)anmetry  in  a  space  of  extremely  high  dimension  is  clear,  at  least  if  there  is 
any  natural  way  of  breaking  up  the  space. 

(vi)  It  is  of  some  interest  to  examine  the  situation  in  which  the  result  of  a  previous 
experiment  of  the  same  type  is  taken  as  the  origin.  In  this  case  (assuming  the  usual 
method,  not  that  of  this  paper,  has  been  applied  to  the  previous  experiment)  ^  is  dis¬ 
tributed  as  o^Xn>  where  <r*  is  the  variance  of  each  component  in  the  first  experiment  so 
that  for  large  a^n.  Consequently  the  expected  loss  for  the  final  estimate  is  nearly 


(52) 


which  is  the  loss  that  would  be  attained  if  the  two  experiments  were  combined  with 
weights  inversely  proportional  to  the  variances.  This  method  can  be  applied  even  if  o*  is 
unknown. 

(vii)  Of  course  if  we  are  interested  in  estimating  only  the  presence  of  other  un¬ 
known  means  ^2,  ‘  ,  fn  cannot  make  our  task  any  easier.  Thus  our  gain  in  over-all  mean 

squared  error  must  be  accompanied  by  a  deterioration  in  the  mean  squared  error  for  cer¬ 
tain  components.  Let  us  investigate  this  situation  to  the  crudest  approximation.  We  sup¬ 
pose  without  essential  loss  of  generality  that  =  V^,  ^2  =  *  •  •  =  fn  =  0.  Also  we  sup¬ 
pose  n  large  and  put  =  ^/n.  Then  the  best  spherically  symmetric  estimator  gives 
nearly  the  same  result  as  [1  —  1/(1  +  f^)]X.  Of  course  Xi  =  \/^  +  Fi  =  py/n  +  Y\ 
where  Fi  is  normally  distributed  with  mean  0  and  variance  1.  The  bias  introduced  in 
the  estimate  of  the  first  coordinate  is  thus  approximately  py/nl{l  +  p^)  which  makes 
a  contribution  of  «pV(l  +  to  the  mean  squared  error  of  the  estimate  of  this  com¬ 
ponent.  This  attains  its  maximum  of  »/4  for  p  =  1.  We  notice  that  at  this  value  of  p,  the 
squared  errors  of  all  other  components  combined  add  up  to  approximately  the  same 
amount  »/4.  For  certain  purposes  this  extreme  concentration  of  the  error  in  one  com¬ 
ponent  may  be  intolerable.  This  is  one  more  reason  why  a  space  of  extremely  large  di¬ 
mension  should  be  broken  up  before  the  methods  of  this  paper  are  applied. 

(viii)  Better  approximations  than  we  have  given  here  will  be  needed  before  this  meth¬ 
od  can  be  applied  to  obtain  simultaneous  confidence  sets  for  the  means.  Nevertheless  it 
seems  clear  that  we  shall  obtain  confidence  sets  which  are  appreciably  smaller  geometri¬ 
cally  than  the  usual  discs  centered  at  the  sample  mean  vector. 

(ix)  For  certain  loss  functions,  for  example 


(53) 


L({,  d)  =supl  ^i  —  di\, 

i 


little  or  no  improvement  over  the  usual  estimator  may  be  possible. 
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